1 Little Rock Child Maltreatment

1.1 Background

This is an exploratory analysis of data-set that reports child maltreatment incidents in Little rock from June 2015 to June 2018. We show here the kernel density maps for the child maltreatment incidents as well as their distribution over time. We also provide some insights into how the different categories of allegations are associated with each other via finding ’frequent itemset’s, and finally, provide correlation structure for showing how child maltreatment relates to a few built environment factor.

1.2 Map of Incidents

First we geocode the addresses using Google’s API and plot the incidents on a map like a point process using the latitude and longitudes extracted by ggmap.

The original data-set has different city names appearing for the reporting addresses. Hence, for this report, we filter the city names by “Little Rock” and use Little Rock location boundaries boundaries to filter out addresses that are outside LR. The map below shows the incidence points as well as their kernel density estimates as a heatmap, after filtering out the points outside Little Rock.

1.3 Dates of Incidents

1.3.1 Weekly Counts

The following figure shows the weekly and yearly child maltreatmnet incidence counts by years over the study period, that goes from June 2015 to June 2018.

1.3.2 Animated Map

This animation shows how the incidents moved over time - the animation shows the points on the map of LR over years from 2015 to 2018. Note that both 2015 and 2018 had fewer events as the data reports only one half of the full year, it does not imply a lower rate of child maltreatment events.

2 Exploratory Analysis: Frequent Itemset Mining

The child maltreatment data that we have has a number of binary variables with different types of allegations. A first step of understanding the pattern of these different categories is to investigate which ones of these categories or sub-types of child maltreatment co-occur and which ones are more exclusive. Towards this, we use a data mining technique called the “frequent Item Set Mining” which is typically used in market basket analysis, click stream analysis, or web link analysis. This method aims at finding regularities in a binary occurence matrix: which is then used to find patterns in the shopping behavior of customers of supermarkets, mail-order companies, on-line shops etc.

More specifically, it is used to find sets of products that are frequently bought together. The patterns that are uncovered are expressed as association rules, for example: If a customer buys bread and wine, then she/he will probably also buy cheese.

2.0.1 Building Blocks

Formally, our goal here is to identify “building blocks” for child maltreatment from the binary incidence variable, observed as an \(m \times n\) sparse binary matrix, where \(m,n\) denote the number of observations and number of different allegations/events respectively. The idea here is to think of this data as a transaction data, where each sample represent a transaction and each cause an item. In the language of `itemset mining’, we have an item set \(I\) and a trasaction set \(D\), and each transaction in \(D\) contains a subset of the items in \(I\). In our case, each subject might have 1’s on a subset of the list of columns.

First, we define some preliminaries and terminologies for explaining the results. Please note that, these are terminologies developed by the researchers in data-mining and computer science, so a “rule” does not imply any causal link

  • A rule \(X \Rightarrow Y\) implies \(X, Y \in I\) and \(X \cap Y = \phi\), where \(X\) is called RHS or ‘antecedent’, and ‘Y’ is called ‘consequent’.
  • The support \(supp(X)\) of a itemset (think, cause-set) \(X\) is defined as the proportion of samples in the data set which contain the item-set.
  • The confidence of a rule is defined \(conf(X \Rightarrow Y ) = supp(X \cup Y )/supp(X)\), and mimics the conditional probability \(P(Y \mid X)\).
  • A third interest measure lift is defined as \(lift(X \Rightarrow Y) = supp(X \cup Y )/(supp(X)\times supp(Y))\), which gives deviation of the support of the rule \(X \Rightarrow Y\) from the support expected under independence - i.e. higher lift means higher association.

An association rules is a rule that surpasses a user-specified minimum support and minimum confidence threshold, and to select the interesting association rules we impose further filter the rules (or rank them) by an additional interest measure - e.g. lift. There are other measures of interest such as Chi-square measure, conviction and leverage but we will skip them for the sake of simplicity.

2.0.2 Item Frequency Plot

To see which items are important in the data set we can use the itemFrequencyPlot. To reduce the number of items, we only plot the item frequency for items with a support greater than 5% (using the parameter support).

2.1 Association rules using the Apriori algorithm

We find all rules with a minimum support of 1% and a confidence of 0.5. This is done using the `Apriori’ algorithm 1. In order to identify the interesting association rules is to look at association rules with known relationships, and study their association strengths or measures of interest, such as, lift, confidence, support etc. Table below shows the top rules by “support” and also looks at other measures of interest such as the chi-square test of association. To be more specific, from the follwing table, the two sub-categories “Sexual Contact” and “Broad-Allegation-Sexual-Abuse” co-occur on 10% of the total cases, and the corresponding lift measure is 7.23, so these two items co-occur 7 times more often than what to be expected if they were independent. This also reflects in the high value of the chi-square test statistics (3591.31). It appears that the items on the right hand side (Broad allegations) are really broader categories of these rules.

2.2 Visualization

2.2.1 Graph-based Visulization

Here we show a few plots to understand these rules better.

Recall that for a rule \(X \Rightarrow Y\), X is called antecedent and Y is called consequent

This shows the rules (or itemsets) as a graph with items as labeled vertices, and rules (or itemsets) represented as vertices connected to items using arrows. For rules, the LHS items are connected with arrows pointing to the vertex representing the rule and the RHS has an arrow pointing to the item.

From this graph, the three broad allegation categories (neglect, abuse and sexual abuse) constitute the core of the graph and they appear to be the main distinguishing categories and the items on the periphery are the sub-categories associated with them.

3 Built Environment Factors

Here we show the correlation patterns for child maltreatment counts within 500 ft grid cells and a number of built environment factors. The factors considered here are shown below, with their basic summary statistics for the grid-level counts.

3.0.1 Data Frame Summary

cm_lr_500ft
Dimensions: 14207 x 9
Duplicates: 14051

No Variable Stats / Values Freqs (% of Valid) Graph
1 CM_Count
[numeric]
Mean (sd) : 0.3 (1.4)
min < med < max:
0 < 0 < 52
IQR (CV) : 0 (5.1)
29 distinct values
2 Tattoo
[numeric]
Min : 0
Mean : 0
Max : 1
0 : 14195 (99.9%)
1 : 12 ( 0.1%)
3 MobHomes
[numeric]
Mean (sd) : 0 (0)
min < med < max:
0 < 0 < 2
IQR (CV) : 0 (63.1)
0 : 14203 (100.0%)
1 : 3 ( 0.0%)
2 : 1 ( 0.0%)
4 MajDept
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 2
IQR (CV) : 0 (18.6)
0 : 14163 (99.7%)
1 : 40 ( 0.3%)
2 : 4 ( 0.0%)
5 LiqStore
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 4
IQR (CV) : 0 (16.2)
0 : 14150 (99.6%)
1 : 3 ( 0.0%)
2 : 51 ( 0.4%)
4 : 3 ( 0.0%)
6 Hotel
[numeric]
Mean (sd) : 0 (0.1)
min < med < max:
0 < 0 < 4
IQR (CV) : 0 (18)
0 : 14154 (99.6%)
1 : 43 ( 0.3%)
2 : 8 ( 0.1%)
3 : 1 ( 0.0%)
4 : 1 ( 0.0%)
7 PublicHS
[numeric]
Min : 0
Mean : 0
Max : 1
0 : 14202 (100.0%)
1 : 5 ( 0.0%)
8 BarbBeauty
[numeric]
Mean (sd) : 0 (0.4)
min < med < max:
0 < 0 < 29
IQR (CV) : 0 (12.7)
11 distinct values
9 Banks
[numeric]
Mean (sd) : 0 (0.4)
min < med < max:
0 < 0 < 9
IQR (CV) : 0 (8.4)
0 : 13896 (97.8%)
1 : 176 ( 1.2%)
2 : 73 ( 0.5%)
3 : 23 ( 0.2%)
4 : 14 ( 0.1%)
5 : 11 ( 0.1%)
6 : 8 ( 0.1%)
7 : 4 ( 0.0%)
8 : 1 ( 0.0%)
9 : 1 ( 0.0%)

Figure 2.2 shows the correlation matrix.

3.1 Maps for Built Environment Factors

Child Maltreatment Reports by 500-ft grids

Child Maltreatment Reports by 500-ft grids

Banks by 500-ft grids

Banks by 500-ft grids

Liquor Stores by 500-ft grids

Liquor Stores by 500-ft grids

Hotels by 500-ft grids

Hotels by 500-ft grids

4 ACS variables: tract level correlations

Next, we look at how child maltreatment counts correlate with a few selected ACS variables at the tract level. The variables shown in the map below are as follows, grouped by the direction of the correlation coefficient. Figure 3.2 shows the heatmap, where the variables are clustered by a hierarchical clustering algorithm where variables strongly correlated are clustered together.

Positive Correlation with child maltreatment counts

 1. Population size (NL) 
 2. Population density (p = .052) 
 3. % Under 18 
 4. % Black 
 5. % Non-White 
 6. % Hispanic 
 7. % Non-Married Family Households 
 8. % Female Headed Households 
 9. % Single Parent Households 
10. % Low Education level (less than high school) 
11. % Renter Occupied 
12. % Population under 18 in poverty 
13. % Population struggling 
14. % Not insured 
15. % On public insurance 
16. % Households with High household cost 

Negative Correlation with child maltreatment counts

1. % Population with a College Education 
2. % Own their home 

4.1 Maps

The following maps shows the spatial distribution of a few selected variables that are correlated with the child maltreatment counts at the tract level as well as the rate defined as the ratio of count of child maltreatment reports to the total population size (NL).

4.2 Footnotes


  1. The Apriori Algorithm [https://en.wikipedia.org/wiki/Apriori_algorithm]. " proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database."